DOCUMENT RESUME 



L 



ED 067 407 



TM 001 802 



AUTHOR 

TITLE 

INSTITUTION 
REPORT NO 
PUB DATE 
NOTE 



Boldt, Robert F. 

An Estimation Procedure for the Rasch Model Allowing 
Missing Data. 

Educational Testing Service, Princeton, N.J. 

RM-72-5 
Apr 7 2 

16p. 



EDRS PRICE 
DESCRIPTORS 

IDENTIFIERS 



MF-S0.65 HC-S3.29 

Data Analysis; *Item Analysis; ♦Mathematical Models; 
♦Measurement Techniques; ♦Statistical Studies; ♦Test 
Interpretation 
♦Rasch Model 



ABSTRACT 

The Rasch model and other latent trait models 
encounter some difficulty when faced with an appreciable amount of 
missing data or omitting behavior. The present note assumes that some 
reasonable missing data model has been formulated which does not 
involve the parameters associated with the latent ability of 
interest. A maximum likelihood function is used that is based on 
probabilities which are conditional on the occurrence of a response. 

(Author/DB) 



ED 067407 



RM-72-5 









RESEARCH 



U S DEPARTMENT OF HEALTH. 
EDUCATION* WELFARE 
OFFICE OF EDUCATION 
THIS DOCUMENT HAS BEEN REPRO- 
DUCED EXACTLY AS RECEIVED FROM 
THE PERSON OR ORGANIZATION ORIG- 
INATING IT. POINTS OF VIEW OR OPIN- 
IONS STATED DO NOT NECESSARILY 
REPRESENT OFFICIAL OFFICE OF EDU- 
CATION POSITION OR POLICY. 



MEMORANDUM 



AN ESTIMATION PROCEDURE FOR THE RASCH MODEL 



A 

ALLOWING MISSING DATA 



F 



Robert F. Boldt 






(N* 




QO 





* 




This Memorandum is for interoffice use. 
It is not to be cited as a published 
report without the specific permission 
of the author . 



Educational Testing Service 
Princeton, New Jersey 
April 1972 

FILMED FROM BEST AVAILABLE COPY 



1 



M ESTIMATION PROCEDURE FOR THE RASCH MODEL 
ALLOWING MISSING DATA 

Robert F. Boldt 

The Rasch model and other latent trait models encounter some difficulty 
when faced with an appreciable amount of missing data or omitting behavior. 
When faced with this fact the response of Wright and Panchapakes an (1969) is 
to urge that all examinees be forced .to respond to all items. Possibly only 
data from those who answer all questions should be used. However, it is 
probably desirable to have an estimation procedure which does not depend on 
complete data. The present note assumes that some reasonable missing data 
model has been formulated which does not involve the parameters associated 
with the latent ability of interest. A maximum likelihood function is used 
that is based on probabilities which are conditional on the occurrence of a 
response. Let 



i 


be the subscript for items; 


i = 1 , 


, 1 


t 


be the subscript for persons; 


t = 1 , 


, T 


E. 

1 


be an easiness parameter for item i ; 


E. = 0 
1 




A t 


be an ability parameter for person t ; 


O 

All 

p 

< 





A be a LaGrange multiplier used to impose norming on the A. 's 

a t be one (l) if the tth examinee responds to the i th item, 

zero otherwise; 

S., be one (l) if the t th examinee responds to the ith item 

correctly, zero otherwise. 

Then the joint probability function of the observations ( 6 's) given the 
pattern of items attempted ( cx. 's) is 
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p = n n 

i t 




( 1 ) 



Examination of Equation (l) indicates that an item attempted by no one, 
or a person who attempts no items, has exponents equal to zero over the 
entire range of the subscripts and hence cannot affect the value of p . 
Equation (l) also indicates that for a person who attempts some items and 
gets none correct, the value of P is at a maximum if E is zero since the 
denominators are at the least unity. Similarly, an item which is attempted 
by some but answered correctly by none would receive a parametric value of 
zero. However, an item correctly answered by all who attempted it or a 
person who correctly answers all items attempted would have infinite values 
associated with their parameters, since for those situations the quantity 
(AE)/(l + AE) must be unity. Finally, one may note that if (AE)/(l + AE) 
is unity, then l/(l + AE) is zero since the two must add to unity. Hence 
A may not be taken as infinite for a person who misses any item as that will 
minimize P , making it zero due to the multiplication by a zero. Similarly 
A may not be taken as zero for any person who gets an item right for, again, 
P would become zero and hence not be maximized. Therefore, the value of A 
assigned to a person who gets items both correct and incorrect would be 
neither zero nor infinite and by a similar argument the value of E assigned 
to an item which is answered both correctly and incorrectly would be neither 
zero nor infinite. The remaining extreme condition, that where A is zero 
and E is infinite, or vice versa, cannot occur since it requires that the 
examinee both succeed and fail on the item. However, what can occur that 
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would confuse matters is that an item could be missed by all who fail to 
answer every item correctly. Then, without using the data' for those who got 
all items right, one would estimate the item parameter to be zero. But we 
do not believe that the item parameter should be zero, nor that the people 
who get all the items correct should get infinity as their ability parameters. 
The fact is that the data do not support estimation of item or people parameters 
in some cases. To provide for this alternative, a preliminary procedure is 
introduced which .eliminates items and people whose parameters do not fit the 
criteria that the items are tried by people who both pass and fail, and that 
the people try items and each person gets some and misses some. This procedure 
searches for people who miss all or get sill, or items that are never missed 
or always missed and throws them out and records that the throwout occurs on 
the first search. Then a second search is conducted for perfectly good or 
bad performances and the items found to be thus are thrown out indicating 
that the throwout occurred on the second search. The searches continue 
always recording the number of the search on which an item, or person is 
eliminated. Thus, whan told that - an item was perfect on- the first trial 
and that a person was a perfect failure on the fourth search, one would 
know that the person got the item correct if he attempted it. More precise 
estimation for items and persons such as these seems impossible. 

Based on the foregoing considerations the following procedures should 
be followed before initiating the likelihood maximization: 

(a) incorporate some provision in the missing data analysis for those 
items answered by no one and delete them from the present analysis; 

(b) incorporate some provision in the missing data analysis for those 
persons who answer no items and delete them from the present analysis; 
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(c) assign a parametric value of zero to all items not eliminated in 
(a) which are answered incorrectly by anyone attempting them, 
indicate search number one, and delete them from the present 
analysis in step (g); 

(d) ' 1 assign a parametric value of zero to all examinees not eliminated 

in (b) who get no items correct, indicate search number one, and 
delete them from the present analysis in step (g); 

(e) assign a parametric value which is infinite to those items not 
eliminated in (a) but answered correctly by all who attempt them, 
indicate search number one, and delete them from the present 
analysis in step (g); 

(f) assign a parametric value which is infinite to those examinees 

not eliminated in (b) but who answer correctly all items attempted, 
indicate search number one, and delete them from the present 
analysis in step (g); 

(g) accomplish the deletions indicated. If there are none, go to the 
optimization procedure. 

When steps (a)-(g) ar? completed, a reduced collection of people and 
items will be left. Some of these people may have gotten all of the remaining 
items correct, or some of the items may have been answered correctly by all 
of the remaining people. Deletion of items may leave some people with no 
responses, or deletion of some people may leave some unattempted items; that 
is, unattempted by those who were not deleted in steps (b) and (g). Similarly, 
some people may have been deleted who were the only ones to respond to certain 
items. With the remaining data carry out the following: 
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(h) delete those items for whom none of the remaining examinees have 
made a response and indicate the search number (2 or more) --missing 
data provisions have already been made so no additional provisions 
are needed; 

(i) delete those persons who made no response to the remaining items and 
indicate the search number (2 or more) — missing data provisions have 
already been made so no additional provisions are needed; 

(j) repeat steps (c)-(g) indicating the appropriate search numbers. 
Continue cycling from (c) through (j) until no deletion is indicated for step 
( b )> as indicated. When this iterative cycling is completed, it is hoped 
that most examinees and most items will be left for entry into the optimiza- 
tion procedure. 

Assuming that the subscripts i and t range over values defined 
only for the remaining data, the likelihood function^ (L) with norming 
of A 's imposed to eliminate a multiplicative indeterminacy is 

L = 225. In A, E. - 22a. , ln(l + A.E. ) + A(A - V) . (2) 

. , It 0 1 . , 1 tj X> .L • 



Then 
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( 5 ) 




are the derivatives needed for the optimization. By the iterative procedures 
terminating at step (g) above, it is ensured that the E 's and A ' s sought 

(> 
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in this procedure are neither zero nor infinite. Hence the derivatives are 
not trivially zero due to infinite parameters, nor does multiplication 
though Equations (3) and (^) produce zeros trivially. Rather, at the 
solution the derivatives are zero because the parameters found are indeed 
optimum for the data. Further, the LaGrange multiplier is zero since, from 
(5) and (if), 




= ZA — 

SeT r^t bk. 

it t 
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since at solution the derivatives are equal to zero. Note also that, given 
the E 's, A is the only variable in one of Equations (if) and given the 

Tj 

A 's, E^ is the only variable in one of Equations (3). Further, it can 
be shown that at solution for a fixed set of A 's, the second derivatives 
with respect to the E 's are negative, as are the second derivatives with 
respect to tho A 's for fixed E 's and hence the optima are maxima. 

Farther, for fixed A 's, which are assumed to be positive, there is only 
one optimum value of an E on the positive half-axis. To show this, note 
that if the E -1 is factored out of the right-hand side of (3); the result- 
ing expression contains terms of the form AE/(l + AE) which is monotonically 

increasing in E . Hence the derivative is zero for only one value of E , 

E. A 

that is, when 6. = 20' ' . 1 _ . . Existence of the optimum is assured 

1* • it 1 t t,A, 

lit 

by steps (a)-(j) since zero and infinity values for E lead to zero values for 
P , and the function is positive at intermediate values (of either the A 's 
or the E ' s ) . 

.. .1 

Newton iterations axe suggested to optimize L . ' Since the LaGrange 
multiplier is zero, A is not represented in the derivatives used and the 
norming of the A 's is preserved as a part of the iterative procedure 
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(step 4a below). For these iterations the needed second derivatives are 
as follows : 






A 
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(l + A, E. ) 2 
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+ Ea 

A t .i^a + Vi ) 2 



(6) 

(7) 



Note these derivatives do not include other parameters of a kind. That is, 
in Equation (6) only one value of the subscript i is involved so that E. 
is not involved if j / i . A similar condition obtains for the A 1 s. 
Therefore, given the A 's, the E 's can be found one at a time. Similarly, 
given the E 's , the A 's can be found one at a time. Therefore the 
increments needed for the iterations are 



Vi = " l^E 



.E. 
3 t 
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where the prefix j 's are included to indicate a value of E at a partic- 
ular iteration. In this notation 



/ . . \E. — ,E. + A.E. . 
(3+1) i 3 1 3 1 



(9) 



Note that no prefixed subscript has been included for the A 's. The A 's 
do not, of course, remain constant during the entire optimization procedure 
and subscripts for A 's will be included for equations which describe 
iterations for finding A 's . However, when E 's are being found the 
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A 's are not changing. The reader should keep in mind that E 's are found 
while A 's are held constant, but after the E ’ s are found, then the A 's 
are modified. The whole procedure stops when all E ' s and A 's satisfy the 
normal equations. 

The iterations described above can yield negative E 's as can be seen 
by examination of Equation (8). Note that the expression in parentheses in 
the numerator of (8) contains terms of the form AE/(l + AE) and since all 
parameters are positive, these terms are fractions. Note also that for each 
such term in the numerator there is a corresponding term containing the square 
of the fraction. Hence when the denominator of (8) is nearly zero, the 
numerator is relatively large and negative, and the incremented value of E 
may be negative. Such a situation arises for overly large values of E and 
so when the incremented E appears to be negative, set the incremented E 
equal to half the previous E rather than using Equation (9)« This halving 
procedure will eventually yield a denominator which is positive and of appreci- 
able size and a nonnegative value for the incremented E . 

The iterative sequence is shown graphically in Figure 1. In this figure 

Insert Figure 1 about here 

the derivative of L with respect to E is monotonically decreasing with a 
monotonically increasing slope. Results are drawn in at three places, a, b, 
and c. If an iteration begins at point a , the effect of the Newton 
iteration is to make the next point be at a + Aa , which is to the left of 
zero due to the flatness of the slope of the function. Note that a point 




of departure at b also moves to the left but not too far, and also from the 
shape of the curve it can be seen that movement is to a point smaller than the 

9 
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solution. Finally, a point of departure at the point c leads to an increase 
but not to more than the solution. Therefore, one expects, except for the 
halving procedure, at most one decrease in E which does not yield a negative 
E and after that a series of positive increments which eventually become so 
small one stops. 

A similar situation exists for the A 1 s, and the halving procedure 
should be followed if negative A 's are observed following addition of the 
increments. The formula for the increment of A is as follows. 



A. A, = 
J t 



dL 
■ c) .A, 



.A. 
J t 



(vtl' ‘ 

.t - a it (rthAjI * [ 8 .t - “it ) • 

i \ i j t/J L 1 1 J w J 



As with the E 's the incrementing rule is 



f * t \ ..A, A. A, . 

(j+1) t J t j t 



( 11 ) 



The iteration procedure will next be outlined, thus introducing notation 
for the stopping rule. That rule will be developed at the end. 

Step 1. Set all A 's equal to v/T , all E 's equal to T/v . 

Step 2. Choose & , the maximum tolerable error for a change in a 

theoretical proportion, e = upper bound on the change on the 
theoretical probability for correct response, and V , the 
norming constant for the A ' s. 

Step 5. Calculate new A 's; initialize t at unity, T) at unity. 

Step 5a. Given , calculate AA^ = A . 

Step 5b. If 0 ^ eA t , go to 5d. If not, set = 0 and go to 
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3c. For A use Equation (10). 
A t 

Step 3c. If A < 0 , copy — into A . 

cL Tj 



If A > 0 , add A to A 



Then go to 3a. 

Step 3d. If t = T , go to Step 4. If not, add unity to t and go 
to 3a. 

Step 4. Calculate new E 's. Initialize i at unity. 

Step 4a. Copy V A^./A. into A^_ , for all t . 

Step 4b. Given E^ calculate AEh = A . For A use Equation (8). 

Step 4c. If 0 = A ^ eE. , go to 4e. If not, set T| = 0 and go to 4d. 

1 E. 



Step 4d. If A < 0 , copy -g- into E^ 



If A > 0 , add A to E. 



Then go to 4b. 

Step 4e. If i = I , go to Step 4f. If not, add unity to i and go 

to 4b. 

Step 4f. If T) equals zero, go to Step 3. If not, exit. 

Steps 1 through 4 yield A 's and E 1 s that introduce change of no more 
than e into a theoretical probability of a correct response. That is, 
when a A is negative, it is recycled. When A is positive and small 
enough, we want to stop. If an E is being computed, a theoretical proba- 
bility of correct response using the incremented E is 



* 



A(E + A) 

1 + A(E + A) * 

The error, e , between the proportion using parameters at a given time and 
the next time is 



A(E + A) AE 

1 + A(E +ZJ 1 + AE 



( 12 ) 



If the A in the denominator of ( 12) is set equal to zero and the A in the 
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numerator is not, we get a quantity which is surely greater than the error 
and which is 

AA 

1 + AE * 

Further, AE/(l + AE) is surely less than one; an even larger quantity is 
A/E . Hence if 

(A/E) < e , 

the change is less than e . Similar logic holds for the change on A . 
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Footnote 



The common dot notation will be used for summation, e.g. 



x. . = 2 x. .. „ 

lj.jl k ljkH 



O 
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Figure Caption 



Fig. 1. Second trial values (a + Aa, b + Ab, c + Ac) of 



following different first trial values (a, b, c) of E . 
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